Active Sampling for Feature Selection
نویسندگان
چکیده
In many knowledge discovery applications the data mining step is followed by further data acquisition. New data may consist of new instances and/or new features for the old instances. When new features are to be added an acquisition policy can help decide what features have to be acquired based on their predictive capability and the cost of acquisition. This can be posed as a feature selection problem where the feature values are not known in advance. We propose a technique to actively sample the feature values with the ultimate goal of choosing between alternative candidate features with minimum sampling cost. Our algorithm is based on extracting candidate features in a “region” of the instance space where the feature value is likely to alter our knowledge the most. An experimental evaluation on a standard database shows that it is possible outperform a random subsampling policy in terms of the accuracy in feature selection.
منابع مشابه
Active Feature Selection Using Classes
Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better perfo...
متن کاملA selective sampling approach to active feature selection
Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Traditional feature selection methods resort to random sampling in dealing with data sets with a huge number of instances. In this paper, we introduce the concept of active feature sele...
متن کاملInterActive Feature Selection
We study the effects of feature selection and human feedback on features in active learning settings. Our experiments on a variety of text categorization tasks indicate that there is significant potential in improving classifier performance by feature reweighting, beyond that achieved via selective sampling alone (standard active learning) if we have access to an oracle that can point to the im...
متن کاملFeature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملA Resampling Technique for Relational Data Graphs
Resampling (a.k.a. bootstrapping) is a computationallyintensive statistical technique for estimating the sampling distribution of an estimator. Resampling is used in many machine learning algorithms, including ensemble methods, active learning, and feature selection. Resampling techniques generate pseudosamples from an underlying population by sampling with replacement from a single sample data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003